\section{Estimate $P\{\hat{X}_k > a\}$}
Given a statistical sample $X_1,\ldots,X_k$ we would like to estimate 

$$
P\left\{\hat{X}_k>a \right\},
$$
where $\hat{X}_k := \frac{1}{k}(X_1+\cdots+X_k)$, $\{X_i\}$ are i.i.d. and $a>E[X_1]$.\\

\noindent
The objective, to estimate $P\{\hat{X}_k > a\}$, can be formulated as estimating $E[h(Y)]$ where $h(y) = 1_{(y>a)}$ and $Y = \frac{1}{k}(X_1+\cdots+X_k)$. By the law of large numbers $\hat{X}_k\rightarrow E\brackk{X_1} < a$ as $k\rightarrow \infty$. Hence $P\{\hat{X}_k > a\}\rightarrow 0$ for $k\rightarrow \infty$, and as such we consider the event that $\hat{X}_k > a$ to be rare and we expect importance sampling to be a method suitable for estimating the probability that this event occurs.\\

\noindent
In this problem we assume that the $X_i$'s are Gamma(2,3) distributed, which means that they have the following density function

$$
f(x)=\frac{9}{\Gamma(2)}xe^{-3x}, \quad x\ge 0.
$$

\noindent
In the following we let $a=1$ and $k=100$.\\

\noindent
That is we want to estimate the following
$$
P\{\hat{X}_{100} > 1\}  = P\{\tfrac{1}{100}(X_1+\cdots X_{100})>1\}= P\{\tfrac{S_{100}}{100}>1\}= E[1_{\{x>1\}}(\tfrac{S_{100}}{100})],
$$
where $S_{100}=\sum_{i=1}^{100}X_i$. Note that since the $X_i$'s are i.i.d. Gamma(2,3) distributed the sum $S_{100} = \sum_{i=1}^{100}X_i$ must be Gamma(200,3) distributed.\\

\noindent
We know the true distribution of $S_{100}$, $\nu^{*100}_0(dx)$, and we want to use the equality $\nu^{*100}_\alpha(dx) = \tfrac{e^{\alpha x}}{\kappa(\alpha)^{100}}\nu^{*100}_0(dx)$ to find the measure under which to simulate $S_{100}$. Here $\kappa(\alpha)$ is the moment-generating function of $X_1$. Using this shifted measure for $S_{100}$ we will then simulate the random variable 
$$
Z_{\alpha}=1_{\{x>1\}}(\tfrac{S_{100}}{100})\kappa(\alpha)^{100}e^{-\alpha S_{100}},
$$
and use the property $E_{\nu^{*100}_\alpha}[Z_\alpha] = P\{\hat{X}_k > a\}$ to estimate the probability from the simulated $Z_\alpha$'s.\\

\noindent
The first thing we need to find is the moment-generating function for a Gamma(2,3) distibution. Under the assumption that $\alpha < 3$, we have
\begin{align*}
\kappa(\alpha) &= E\brackk{e^{\alpha X}}\nonumber\\
 &= \int_0^\infty e^{\alpha x} \frac{3^2}{\Gamma(2)}xe^{-3x}dx\nonumber\\
 &= \int_0^\infty e^{\alpha x} \frac{3^2}{\Gamma(2)}xe^{-(3-\alpha)x}dx\nonumber\\
 &= \int_0^\infty e^{\alpha x} \frac{3^2}{\Gamma(2)}xe^{-(3-\alpha)x}dx\nonumber\\
 &= \frac{3^2}{(3-\alpha)^2}\int_0^\infty e^{\alpha x} \frac{(3-\alpha)^2}{\Gamma(2)}xe^{-(3-\alpha)x}dx\nonumber\\\
 &= \frac{3^2}{(3-\alpha)^2}.\nonumber
\end{align*}

\noindent
Next we need to find the optimal $\alpha$. To do this we need to consider which properties the optimal $\alpha$ has, and it can be shown that we need to find the $\alpha$ that minimizes the second moment $E_{\nu^{*100}_\alpha}[Z^2]$. The second moment is given as 
\begin{align}\label{eq:2ndMoment}
E_{\nu^{*100}_\alpha}\left[\left(1_{\{x>1\}}\left(\tfrac{S_{100}}{100}\right)e^{-\alpha S_{100} + 100 \log(\kappa(\alpha))}\right)^2 \right] \le E_{\nu^{*100}_\alpha}\left[1_{\{x>1\}}\left(\tfrac{S_{100}}{100}\right) e^{-200(\alpha - \log(\kappa(\alpha)))} \right] 
\end{align}
where we have used the inequality that $S_{100}>100$ when $1_{\{x>1\}}(\tfrac{S_{100}}{100})=1$, and that an indicator function squared is just the indicator function. Since minimizing the left hand side of equation~\eqref{eq:2ndMoment} is rather complicated (it involves minimizing $e^{200 \log(\kappa(\alpha))}\int_{100}^\infty \frac{3^{200}}{\Gamma(200)}x^{199} e^{-x(3+2\alpha)}dx$ with respect to $\alpha$) we have chosen to minimize the upper bound on the right hand side. This value will then be used as an approximation for the optimal value.\\ 

\noindent
It is easily seen that an $\alpha$ minimizing the right hand side of equation~\eqref{eq:2ndMoment} must maximize $g(\alpha)=\alpha-\log(\kappa(\alpha))$. Thus we solve $g'(\alpha)=1-\frac{2}{3-\alpha} = 0$. This is easily seen to have the unique solution $\alpha = 1$. Since $g''(\alpha) < 0$ the function $g$ attains its maximum at $\alpha = 1$, which can also be seen in figure~\vref{fig:optalpha}.\\

\begin{figure}
\centering
\includegraphics[width=10cm]{problem1_1.eps}
\caption{Finding optimal $\alpha$.}
\label{fig:optalpha}
\end{figure}

\noindent
Thus the measure under which we will simulate is given by

\begin{align*}
\nu^{*100}_\alpha(dx) &= \tfrac{e^{\alpha x}}{\kappa(\alpha)^{100}}\nu^{*100}_0(dx)\\
&= \tfrac{e^{x}}{\kappa(1)^{100}} \frac{3^{200}}{\Gamma(200)}x^{200-1}e^{-3x}dx,\quad\alpha = 1\\
&= \tfrac{2}{3}^{200} \frac{2^{200}}{\Gamma(200)}x^{200-1}e^{-2x}dx\\
&= \frac{2^{200}}{\Gamma(200)}x^{200-1}e^{-2x}dx\\
\end{align*}
which is easily recognized as the density for the Gamma(200,2) distribution.\\

\noindent
Pseudocode for simulation of $Z_\alpha$ is given in algorithm~\ref{Problem1Algo}. We have used \texttt{R} to perform the simulations, and it is numerically stable even when the shape parameter is 200, as it is in our case. This means that we can simulate $S_{100}$ directly under the shifted measure. If it wasn't numerically stable, we could just use that we know the convolution of independent Gamma distributed random variables, and in each for-loop simulate the 100 $X_i$'s that are i.i.d. Gamma(2,2) random variables under the shifted measure, and sum them. In the code in appendix~\ref{code1} we have done both, but since \texttt{R} is very efficient in vector calculations we are able to have a larger $N$ when we use the version where we simulate directly from the Gamma(200,2) distribution.\\

\begin{algorithm}
\caption{Estimation of $P\{\hat{X}_k > a\}$ by importance sampling}
\label{Problem1Algo}
\begin{algorithmic}[1]
  \FOR {$i \leftarrow 1$ to $N$}
  \STATE Generate $s_{100}$ $\sim$ Gamma(200,2)
  \STATE $z_i \leftarrow 1_{\{x>1\}}\left(\tfrac{s_{100}}{100}\right)e^{-(s_{100} -200 \log(3/2))}$
  \ENDFOR
  \RETURN $\tfrac{1}{N}\sum_{i=1}^Nz_i$
\end{algorithmic}
\end{algorithm}

\noindent
We estimate the probability that $P\{S_{100} > 1\}$ to be $3.38*10^{-10}$, and our estimate of the variance of this probability is $9.45*10^{-19}$. This gives the following confidence band on our estiamte $[3.32*10^{-10};  3.44*10^{-10}]$. Furthermore we estimate the skewness of our estimator to be 3.67, and this skewness can also be seen in figure~\vref{fig:histz}, which plots the histogram of the simulated $Z_i$'s.

\begin{figure}
\centering
\includegraphics[width=10cm]{problem1_2.eps}
\caption{Plot of histogram of $z$.}
\label{fig:histz}
\end{figure}

\newpage
